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ABSTRACT 

Hidden Markov Chains (HMCs) are commonly used mathe¬ 
matical models of probabilistic systems. They are employed 
in various fields such as speech recognition, signal process¬ 
ing, and biological sequence analysis. Motivated by appli¬ 
cations in stochastic runtime verification, we consider the 
problem of distinguishing two given HMCs based on a single 
observation sequence that one of the HMCs generates. More 
precisely, given two HMCs and an observation sequence, a 
distinguishing algorithm is expected to identify the HMC 
that generates the observation sequence. Two HMCs are 
called distinguishable if for every e > 0 there is a distinguish¬ 
ing algorithm whose error probability is less than e. We show 
that one can decide in polynomial time whether two HMCs 
are distinguishable. Further, we present and analyze two dis¬ 
tinguishing algorithms for distinguishable HMCs. The first 
algorithm makes a decision after processing a fixed number 
of observations, and it exhibits two-sided error. The second 
algorithm processes an unbounded number of observations, 
but the algorithm has only one-sided error. The error prob¬ 
ability, for both algorithms, decays exponentially with the 
number of processed observations. We also provide an algo¬ 
rithm for distinguishing multiple HMCs. 
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Hidden Markov Chains (HMCs) are commonly used math¬ 
ematical models of probabilistic systems. They are specified 
by a Markov Chain, capturing the probabilistic behavior of a 
system, and an observation function specifying the outputs 
generated from each of its states. Figure[T]depicts two exam¬ 
ple HMCs Hi,H 2 , with observations a and b. We consider 
finite-state HMCs in this paper. An HMC randomly gen¬ 
erates a (conceptually infinite) string of observations. The 
states producing the observations are not observable (note 
that So and si output the same observation a in the exam¬ 
ple). This motivates the term hidden. 

HMCs are widely employed in fields such as speech recog¬ 
nition (see |22| for a tutorial), gesture recognition [B], mu¬ 
sical score following [23], signal processing [3, and climate 
modeling [I]. HMCs are heavily used in computational bi¬ 
ology |12| . more specifically in DNA modeling [8] and bi¬ 
ological sequence analysis [U, including protein structure 
prediction [18], detecting similarities in genomes [14] and 
gene finding [2]. Following [19], applications of HMCs are 
based on two basic problems, cf. [TBl Chapter 2]: The first 
one is, given an observation string and an HMC, what is 
the most likely sequence of states that produced the string? 
This is useful for areas like speech recognition, see ^22\ for 
efficient algorithms based on dynamic programming. The 
second problem is, given an observation string and multiple 
HMCs, identify the HMC that is most likely to produce the 
observation. This is used for classification. 

The second problem raises a fundamental question, which 
we address in this work: Given two HMCs, and assuming 
that one of them produces a random single observation se¬ 
quence, is it even possible to identify the producing HMC 
with a high probability? And if yes, how many observa¬ 
tions in that observation sequence are needed? At its heart, 
this question is about comparing two HMCs in terms of their 
(distributions on) observation sequences. To make this more 
precise, let a monitor for two given HMCs Hi , H 2 be an al¬ 
gorithm that reads (increasing prefixes of) a single observa¬ 
tion sequence, and at some point outputs “Hi” or “H’ 2 ”. The 
distinguishability problem asks for two given HMCs Hi, H 2 , 
whether for all e > 0 there is a monitor such that for both 
1 = 1,2, if the monitor reads a random observation sequence 
produced by Hi, then with probability at least 1 — e the 
monitor outputs “Hi”. 

A related problem is equivalence of HMCs. Two HMCs 
are called equivalent if they produce the same (prefixes of) 
observation sequences with the same probability. Equiva¬ 
lence of HMCs has been well-studied and can be decided in 
polynomial time, using algorithms based on linear algebra. 




H, 



Figure 1: Two HMCs. Here Hi and H 2 are distinguishable (see Example 121) and hence not equivalent. 


see e.g. |1511291 [T0| . The exact relation between equivalence 
and distinguishability depends on whether a monitor has ac¬ 
cess to a single random observation sequence or to multiple 
such sequences. 

(1) Consider first a notion of a “monitor” that has access 
to several random observation sequences, each gener¬ 
ated starting from the same initial state. Call this a 
multi-monitor. If the two given HMCs are equivalent 
then even a multi-monitor can only guess. Now as¬ 
sume the two HMCs are not equivalent. It is known 
(see e.g. [29]) that then there exists a linear-length pre- 
hx of the observation sequence that is more likely in 
one HMC than in the other HMC. A multi-monitor 
could exploit the law of large numbers and only count 
how often that particular observation prefix occurs. 
Hence for multi-monitors, distinguishability and non¬ 
equivalence coincide. 

(2) Consider now a monitor that has access to only a single 
random observation sequence. Here, non-equivalence 
does not imply distinguishability: loosely speaking, for 
some HMCs it is the case that while the observation 
prefix is increasing, the evidence added by each new 
observation does not help the monitor enough to make 
up its mind about which HMC produces the sequence. 
Figure [5] shows an example of two HMCs that are 
neither equivalent nor distinguishable. (On the other 
hand, the HMCs in Figure [T] are not equivalent, but 
are distinguishable as shown later in Section |3|). 

We assume in the rest of the paper that a monitor has access 
to only a single random observation sequence. This is the 
more natural version of the problem, both from the point 
of view of the motivation mentioned above and from our 
application in stochastic runtime monitoring. 

We prove that the distinguishability problem is decidable 
in polynomial time. We establish this result by showing that 
two HMCs are distinguishable if and only if their total varia¬ 
tion distance is equal to 1. This distance measure for HMCs 
was studied in [3, and a polynomial-time algorithm for de¬ 
ciding whether the distance of two HMCs is 1 was given 
there. That polynomial-time algorithm includes a mecha¬ 
nism for checking whether two given HMCs are equivalent 
(but also needs other ingredients). 

It is important to note that deciding distinguishability 
does not readily provide a family of monitors as required by 
the dehnition of distinguishability; it only guarantees their 
existence. Developing a family of monitors (one for any de¬ 
sired error bound e > 0) requires more insights. Inspired by 


the area of sequential analysis |30| . we design monitors that 
track the likelihood ratio of the sequence of observations. 
However, estimating the error probability of the monitors 
is challenging, since one needs a bound on the change of 
the likelihood ratio per observation. Unfortunately, such 
a bound does not exist for HMCs in general, not even on 
the difference of the Zog-likelihood ratio (see Example El). 
Hence, in this paper we take a different route: We consider 
a different class of monitors that translate the given random 
observation sequence into a certain kind of non-homogenous 
“random walk” with bounded step size. This allows us to 
employ martingale techniques, specihcally Azuma’s inequal¬ 
ity, to prove error bounds that decay exponentially with the 
number of observations the monitor makes. Then we show 
that the error bounds from a random-walk monitor carry 
over to a likelihood-based monitor. 

More specihcally, we present two likelihood-based moni¬ 
tors for distinguishable HMCs. The hrst one makes a de¬ 
cision after reading a hxed number of observation symbols. 
This number is chosen depending on the desired error bound: 
we show that for an error probability e it suffices to read the 
prehx of length Clog i, where C > 0 is a polynomial-time 
computable constant. This error is two-sided, i.e., the mon¬ 
itor may mistake Hi for H 2 and vice versa. 

The second monitor has only one-sided error: observation 
sequences from Hi are almost always (i.e., with probabil¬ 
ity 1) recognized as stemming from Hi. However, on se¬ 
quences generated by H 2 , with high probability the monitor 
never gives an answer. This is useful in applications such as 
runtime verihcation (see Section Ejl. The expected number 
of observations from Hi that the monitor processes before 
giving its decision is 0(log i), while ensuring an error prob¬ 
ability of at most e on observations from H 2 . For this class 
of monitors, we have a polynomial-time algorithm that com¬ 
putes an 0(log i) upper bound on the expected number of 
observations from Hi before a decision is given. 

Main Contributions. 

• We show that the distinguishability problem can be 
decided in polynomial time (Section |3]|. 

• We design two classes of likelihood-based monitors 
that accomplish the following tasks (e > 0 is an er¬ 
ror bound): 

(1) After 0(log j) observations (the exact number 
can be efficiently computed from the given HMCs) 
the first monitor class provides a guess about the 
source of the observations, such that the proba¬ 
bility that the guess is wrong is at most e (Sec- 








Figure 2: Two HMCs. Here Hi and H 2 are not distinguishable but not equivalent. 


tion l4.2l) . This can be extended to more than two 
HMCs (Section [Till . 

(2) For the second monitor class, if Hi produces the 
observation sequence then the monitor raises an 
alarm almost surely, and after an expected number 
of 0(log i) observations (such an upper bound 
can be efficiently computed from the given HMCs 
and e); if H 2 produces the observation sequence 
then, with probability at least 1 — £, the monitor 
never raises an alarm (Section 14.3p . 

• We apply our results to stochastic runtime verification, 
where a monitor should distinguish correct and faulty 
behaviour of a single stochastic system. This yields 
polynomial-time decidability of monitorability as de¬ 
fined in |26| . as well as efficient runtime monitors for 
stochastic systems, see Section 

Missing proofs can be found in the appendix. 

Related Work. The area of sequential analysis in statis¬ 
tics, pioneered by Wald (see [30]), deals with the problem of 
hypothesis testing using repeated and unbounded sampling. 
A line of work going back to Phatarfod |21l 1281124| investi¬ 
gated the application of sequential analysis, more specifically 
the sequential probability ratio test, to Markov chains. Sim¬ 
ilar to our work, the goal in the above works is to identify 
a Markov chain among several, in this case using likelihood 
ratios. A monitor algorithm is derived by keeping track of 
likelihood ratios: it gives notice once the likelihood ratio 
drops below or exceeds some fixed threshold. One problem 
with this approach is that error probabilities can only be 
estimated—not bounded—by the heuristic assumption that 
the excess over the threshold is not big. This assumption 
is not always true. A more important difference from our 
work is that the observation in each state equals the state, 
in other words, the Markov chains are not hidden. 

There is early related work that is more specific to HMCs. 
The paper [16] aims at measuring a certain distance between 
two HMCs by running one of them. This is in spirit close 
to our work, as a positive distance in their sense could be 
transformed to a monitor. However, the authors place strong 
assumptions on the Markov chains, in particular ergodicity. 
If this assumption is removed, their distance can be different 
for different runs, and the existence of a lower bound on the 
possible distances is unclear. 

Work by Alur et al. [3] also aims at distinguishing 
probabilistic models, but there are important differences. 
First, they consider Markov Decision Processes rather than 
Markov chains, i.e., they consider strategies to distinguish 
two such processes, which is a more general, and com¬ 
putationally harder problem (they show PSPACE- and 
EXPTIME-completeness results). Second, their problems 


are defined such that the exact values of the transition prob¬ 
abilities is unimportant. In our case this is different. 

The work in [1^ deals with comparing two HMCs in terms 
of various distance measures. Among other results, they 
show NP-hardness of computing and approximating the £ 1 - 
distance. The HMCs considered there generate distributions 
on finite strings of observations, as each HMC has a dedi¬ 
cated end state, reached with probability 1, where the HMC 
“stops”. Such HMCs form a subclass of HMCs, whereas we 
consider general HMCs. 

Our work on distinguishability is inspired by the work 
on monitorability that was defined in [^. In |26l Section 
4.1] a notion of strong monitorability is proposed and it is 
shown that deciding it is PSPACE-complete. By our results 
in Section [G] strong monitorability corresponds to a stronger 
form of distinguishability, so the latter is PSPACE-complete 
as well. In light of this it might be surprising that (general) 
distinguishability turns out to be decidable in polynomial 
time. In [^ it was wrongly claimed that monitorability is 
undecidable for finite-state systems. Our result not only 
shows that it is decidable, but also gives a polynomial-time 
decision procedure. 

Our work on exponentially decaying monitors is inspired 
by the exponentially converging monitorable systems defined 
in m- The algorithms presented there are for a very re¬ 
stricted class of HMCs, whereas our monitors work for all 
pairs of distinguishable HMCs. 

Closely related to some of our results is a very recent 
work by Bertrand et. al. [^. This paper also exploits the 
results of [3 to obtain polynomial-time decidability of “AA- 
diagnosability” of stochastic systems, a problem related to 
monitorability (Section [6]|. Although the technical report 
of our work had been available m, the results in [^ were 
obtained independently and are largely orthogonal to ours: 
whereas we focus on constructing specific monitors with 
computable error bounds, they investigate the decidability 
and complexity of several variants of diagnosability. 

2. DEFINITIONS 

Notation. For a countable set S, a probability distri¬ 
bution ijj over S' is a function ip : S —>■ [0,1] such that 
X)sgs'*/’(s) = 1. For an element s G S, we let 5s denote 
the unique distribution with 5s (s) = 1. We let Distr{S) 
denote the set of all distributions over S. We let S*, S^ re¬ 
spectively denote the set of finite sequences (strings) and the 
set of infinite sequences of symbols from S. If S is a finite 
set then we let |S| denote its cardinality. For any u G S*, 
we let |u| denote its length. For any real number x, we let 
|a:| denote its absolute value. 

Hidden Markov Chains. A Markov chain is a triple 
G = (S, R, (p) where S is a set of states, 7? C S' x S', and 






4> ■. -R —>■ (0,1] is such that 5I(t (s t)6fl ^ 

s € S. A Markov chain G and an initial state s G S induce 
a probability measure, denoted by Vs, on measurable sub¬ 
sets of {s}S“ in the usual way: more precisely, we consider 
the CT-algebra generated by the cylinder sets {sqSi • • • Sn}S‘^ 
for n > 0 and so = s and Si G S, with the probability 
measure Vs such that 

n 

Vsi{soSi •••s„}5“) = ]^<(>(si_i,Si). 

i = l 

Let E be a finite set. A Hidden Markov Chain (HMC), 
with observation alphabet E, is a triple {G,0,so), where 
G = {S, R, (p) is a Markov chain, and O : 5" —>■ E is the 
observation function, and so € S is the initial state. We 
may write V for Vsq- For £. C E" we define the inverse 
observation function 

[£] := {sosi • • • e 5“" I 0(so)0(si) •••€£}. 

Monitors. A monitor M : E* —>■ {_L, 1} is a computable 
function with the property that, for any u £ E*, if M{u) = 1 
then M{uv) = 1 for every u G E*. Let C{M) C E" denote 
the set of infinite sequences that have a prehx u with M{u) = 
1. (Intuitively, C{M) is the set of observation sequences 
which the monitor decides to have been generated by the 
first HMC among a pair of such HMCs.) Given an HMC, 
the event [£(M)] is measurable, as it is a countable union 
of cylinder sets. 

Distinguishability. Given two HMCs H\,H 2 with the 
same observation alphabet E, we write Ri,R 2 , [-Ji, [■]2 for 
their associated probability measures and inverse observa¬ 
tion functions. HMCs H\,H 2 are called distinguishable if 
for every e > 0 there exists a monitor M such that 

Pi([£(M)]i) > 1-e and R 2 ([£(M)] 2 ) < £. 

3. POLYNOMIAL-TIME DECIDABILITY 
OF THE DISTINGUISHABILITY PROB¬ 
LEM 

For two HMCs H\,H 2 dehne the (total variation) distance 
between Hi and H 2 , denoted by d{Hi, H 2 ), as follows: 

d{Hl,H2)~ sup \Vi{[E]i)-V2{[E]2)\ , 

where the supremum ranges over all measurable subsets 
of E“. It is shown in [7] that the supremum is in fact a 
maximum. In particular, if d{Hi,H 2 ) = 1 then there exists a 
measurable set E C E" with Vi{[E]i) = 1 and V 2 {{E] 2 ) = 0. 
We show: 

Proposition 1. HMCs Hi, H 2 are distinguishable if and 
only if d{Hi,H 2 ) = 1. 

Proof. Let Hi,H 2 be two given HMCs. We show that 
Hi, H 2 are distinguishable if and only if d{Hi, H 2 ) ~ 1. 

• “if”: Let d{Hi,H 2 ) — 1. Choose e > 0 arbitrarily. It 
follows from [3 Theorem 7] and the discussion after 
[3 Proposition 5] that there are fc G N and VF C E*^ 
such that 

Pi([WE“]i) > 1-e and p 2 ([IFE“] 2 ) < e. 

Construct a monitor M that outputs 1 after having 
read a string in IF. Then we have C{M) = WTV. It 


follows: 

Pi([£(M)]i) > 1-e and P 2 ([£(M)] 2 ) < e . 

Since e was chosen arbitrarily, the HMCs Hi,H 2 are 
distinguishable. 

• “only if”: Let Hi,H 2 be distinguishable, i.e., for every 
e > 0 there exists a monitor Mg such that 

Pi([£(Me)]i) > 1-e and p 2 ([£(Me)] 2 ) < £. 

Then we have: 

d{Hi,H 2 )= sup \V{[E]i)-V{IE] 2 )\ 

BCE" 

>sup {VmMg)]i)-V{[£{Mg)] 2 )) 

e>0 

> sup (1 — 2s) = 1 

e>0 

This concludes the proof. □ 

It follows that HMCs Hi,H 2 are distinguishable iff there is 
a distinguishing event, i.e., a set E C E“ with Vi{[E]i) = 1 
and V 2 {[E] 2 ) = 0. 

Example 2. Consider the HMCs Hi,H 2 from Figure[J\ 
By computing the stationary distributions, one can show 
that, the distinguishing event E is given by 

E = {(71(72 • • • G E“ I lim = 5/7} , 

n—^00 72 

where f{n) denotes the number of occurrences of a in 
the prefix (7i(72 • • • (7„, is a distinguishing event for Hi, H 2 - 
Hence Hi, H 2 are distinguishable. Here, counting the fre¬ 
quencies of the observations symbols suffices for distinguish¬ 
ing two distinguishable HMCs. In general, this is not true: 
the order of observations may matter. □ 

Proposition [3 implies the following theorem: 

Theorem 3. One can decide in polynomial time whether 
given HMCs Hi, H 2 are distinguishable. 

Proof. In [3 Algorithm 1 and Theorem 21] it is shown 
that, given two HMCs Hi,H 2 , one can decide in polynomial 
time whether d{Hi,H 2 ) = 1. (The algorithm given there 
solves ni linear programs, each with ni-|-n 2 variables, where 
ni,n 2 is the number of states in Hi, H 2 , respectively.) Then 
the result follows from Proposition [3 □ 

Distinguishing events cannot in general be defined by moni¬ 
tors, as a monitor can reject an observation sequence only on 
the basis of a finite prefix. Moreover, the decision algorithm 
for Theorem [3 can assure the existence of a monitor for two 
given HMCs, but the decision algorithm does not provide 
useful monitors. That is the subject of the next section. 

4. MONITORS 

In this section, we present concrete monitors, with error 
bounds. To this end we give some additional definitions in 
Section lim where we also explain how monitors can keep 
track of certain conditional distributions. We also introduce 
“profiles”, a key concept for our proofs of error bounds. In 
Sections lim and l4.3l we present monitors for distinguishable 
HMCs with two-sided and one-sided error, respectively. In 



Section im we provide a monitor for distinguishing among 
multiple HMCs. 

For i = 1,2, let Hi = {Gi,Oi, Sifi) be two HMCs with 
the same observation alphabet E, where Gi = {Si, Ri,(j>i). 
Without loss of generality we assume Si PI S2 = 0 . Let 
m := |Si| + IS2I. We fix H\,H2 and m throughout the 
section. 


4.1 Keeping Track of Probabilities and Pro¬ 
files 

Let i £ { 1 , 2 } and rp € Distr{Si). For u £ E* define 
pri{p),u) ■- pjis) ■'Pi,4[u'E‘^]i) . 

s€.Si 

Intuitively, pri{'ip,u) is the probability that the string u is 
output by HMC Hi starting from the initial distribution ip. 
For W C E"* we also define 

pri{tp,W) := pri{tp,u) , 
uew 


which is the probability that Hi outputs a string in W start¬ 
ing from distribution ip. For u £ EE* and s,t G Si define 

subi{s,u, t) ■- Vi,s{[uTP’]i n Sf'"^{t}sr). 

Intuitively, subi{s,u,t) is the probability that Hi outputs u 
and is then in state t, starting from state s. We have: 

^ V'(s) • (1) 

seSi teSi 


For any s,r £ Si and u £ EE* and a £ E we have: 


subi{s, ua, r) 


T,t€Si subi{s,u,t)(pi{t,r) 

0 


if Oi{r) = a 
otherwise 


So if a monitor has kept track of the values subi{s, u, t)s,teSi 
for a prefix u of an observation sequence, it can, 
upon reading the next observation a, efficiently compute 
subi{s,ua,t)s,t£Si and, by 0, also pr^{Ss^ g,ua). 

For u £ E* define the likelihood ratio 

pr2{S^,u) 

pr^{S„ig,u) 

Finally, for u £ EE* with pri{%p,u) > 0 , define the distribu¬ 
tion cdi{tp,u) (which stands for “conditional distribution”) 
as follows: 

cdi{tp,u){t) ■.= -7!-—r • ip{s) ■ subi{s,u,t) for t £ Si 

prM,u) 

Intuitively, cdi{'ip, u){t) is the conditional probability that Hi 
is in state t given that it has output u and started from ip. 
As explained above, a monitor can efficiently keep track of 
lr{u) and cdi{ip,u). 

We say that a pair of distributions {tpi, 'ip2) £ Distr{Si) x 
Distr{S2) is reachable in (Hi, 1/2) if there is m £ EE* with 
ipi = cdi{5siQ,u) for i = 1 , 2 . A profile for Hi,H2 is a 
pair {A,c) such that A : Distr{Si) x Distr{S2) —> 2^ and 
c £ (0,1] and 

pri{'ipi,A{'ipi,iJ2)) - pr2{ip2,A{'ipi,'ip2)) > c 

holds for all reachable pairs ('i/’i, V’z) of distributions. For the 
monitors presented in this section the following proposition 
is crucial. 


Proposition 4. Let HMCs Hi,H2 be distinguishable. 
Then there is a number c > 0, computable in time poly¬ 
nomial in the sizes of Hi, H2, such that there is a profile 
{A,c). 

4.2 Monitors with Two-Sided Error 

In this and the next subsection, we assume that Hi,H2 
are distinguishable, and fix a profile {A,c). The monitors 
of this subsection take an observation sequence as input, 
and at some point output a value from { 1 , 2 , 3 } indicating 
a decision regarding which of the two HMCs generated the 
observations. An output of 3 indicates that neither of the 
HMCs could have generated it. The monitors of this subsec¬ 
tion have two-sided errors: the answers 1 or 2 may be wrong 
(with a small probability). 

We define a likelihood-based monitor M2 (the subscript 
denotes two-sided error) as follows. Monitor M2 runs in 
phases', in each phase, the monitor receives m observations. 
The monitor runs at most N phases, where N £ N is a. 
parameter fixed in advance: choosing a larger N leads to 
smaller error probabilities. After reading an observation se¬ 
quence u of length N ■ m, it computes the likelihood ratio 
lr{u). Monitor M2 outputs 1 if lr{u) < 1 , and 2 if lr{u) > 1 . 
It may output either 1 or 2 if lr{u) = 1 . Monitor M2 needs 
no access to the function A. 

The following theorem says that the observation sequences 
for which monitor M2 outputs 1 are much more likely to be 
generated by Hi. By symmetry, the observation sequences 
for which M2 outputs 2 are much more likely to be generated 
by H2. 

Theorem 5. Consider the monitor M2 that reads the 
first N ■ m observations. Let jC{M2) C E“ be the set of ob¬ 
servation sequences for which M2 outputs 1. Then we have 


Hence, 

■ and 

P2([£(M2)]2) < 2exp(^-^.iV^ . 

Proving the bounds of Theorem [5] is challenging due to 
the following reasons. For k > 0 define a random variable 
Lk : {si.ojSr ^ Q by 

Lk{si,oSiS 2 ■■■)■= lr{0{sifi)0{si)0{s2) ■ ■ ■ 0{sk-i)) • 

Denote by £1 the expectation with respect to Vi. It was 
proved in [3 proof of Proposition 6] that £i{Lk-i-i \ Lk = 
x) = X holds for all x £ Q, i.e., the sequence Lo, Li, ... is 
a martingale. Unfortunately, the differences |Lfc-i-i — Lfc| are 
not bounded, neither are the differences | logLfc+i — logLfc|, 
as the following example shows. 

Example 6. Consider the HMCs Hi,H 2 in Figure 
For n > 1, the probability that Hi generates the string 
a" is (|)”~^ + I • X)"Tg^(|)* which is easily shown to 
be i(l -I- (|)"~^), and the probability that Hi generates 
a"b is (|)". The corresponding probabilities for H 2 are 
and (^)", respectively. Now consider any a £ 
{sq S 2 }{so, si, 52 }“, for some n > 1. The two likelihood ra¬ 
tios Ln{a) and L„+i(q:) corresponding to the length n and 


Pi{[C{M2)]i) > l-2exp - 


18 


Pi([£(M2)]i)-p2([T(M2)]2) > l-2exp - 


18 
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Figure 3: Two HMCs where the difference in log-likelihood ratios is unbounded 


(i)"-i 

length n +1 prefixes of a, are given by Ln{a) = i 

and L„+i{a) = (f)". Since n > 1 , we see that Ln{a) < 
2 • (5)"”^ < 1. Hence, ^ (I)**- have that 

log(L„+i(Q)) — log(L„(a)) > n-log(|), which is unbounded 
with increasing n. In a more general case, if a has b appear¬ 
ing infinitely often with an increasing number of a-symbols 
between two successive b-symbols, then the difference in the 
log-likelihood ratio of two successive prefixes of a, with the 
second prefix ending with b, is unbounded. □ 


This problem of unbounded differences between subse¬ 
quent log-likelihood ratios prohibits a standard error anal¬ 
ysis of hypothesis-testing methods from sequential analy¬ 
sis m- Moreover, Azuma’s inequality then does not yield 
an exponentially decaying error bound. As a consequence, 
we cannot directly prove the bounds of Theorem [5] There¬ 
fore, in this subsection, we take a detour. First we develop 
another monitor that is not based on likelihoods but 
is based on a random walk. Then we prove error bounds 
for M2. Then we show that the error bounds for carry 
over to the likelihood-based monitor M2. 

The monitor M2 also runs in N phases, receiving m ob¬ 
servations in each phase. The monitor maintains two prob¬ 
ability distributions ipi € Distr{Si), 'tp2 £ Distr{S2), and 
a variable x that takes rational values. Initially, are 

set to (5si 01 0 respectively, and x is initialized to 0. The 

monitor keeps track of ipi = cdi{Sa- p, u), for i = 1 , 2 , where 
u is the observation string received thus far. The variable x 
indicates a current estimate about which of the two HMCs is 
being observed: a negative value of x indicates a preference 
for Hi', a positive value indicates a preference for H2. In 
each phase, M2 waits until it gets the next m observations 
and then updates x, ipi and ip2. 

We describe a phase of M^. Let ipi,'ilj2,x be the values at 
the end of the previous phase. Let pi = pri{'ipi, A{ 4 ’i,ip 2 )) 
and p2 = pr2(1112, A{'tpi,'ip2)). By the definition of a profile 
we have pi — P2 > c > 0 . Denote by u £ S’" the string 
of observations received in the current phase. Assume that 
> 0 and pr2{'tf2,v) > 0 (i.e., v can be gener¬ 
ated with non-zero probability by both Hi,H2 from ilJi,'ip2 
respectively). If pi -|-p2 < 1 then x is updated as follows: 


_ - 1 

1 ™ I P1+P2 
I ^ 2-pi-p 


if u G V’2) 

if V ^ A(lfl,'tp 2 ) 


If Pi -I- P2 > 1 then X is updated as follows: 


2 -PI -P9 

P1+P2 


if u G A{iii,'tp 2 ) 

if u ^ A{'lfl,lp 2 ) 


Note that in all cases, the value of x is increased or decreased 
by at most 1. After this, ip-i,'tp2 are set to cdi(i/’i,u) and 
cd2{’tp2,v) respectively, and the phase is finished. On the 
other hand, if prj^(ilii,v) > 0 and pr2(1112,0) = 0 then 1 is 
output; if ppj (ipi, u) = 0 and pr2(1112, u) > 0 then 2 is output; 
if prj^(ipi,v) = 0 and pr 2 (' 4 > 2 ,v) = 0 then 3 is output. In 
those cases the monitor terminates immediately. 

After N phases, if a; < 0 then the monitor M^ outputs 1, 
otherwise it outputs 2. An output of i indicates that the se¬ 
quence is believed to be generated by Hi. Note that M ^—in 
contrast to M2—needs access to the function A. By con¬ 
structing a supermartingale and applying Azuma’s inequal¬ 
ity we obtain: 


Theorem 7. Consider the monitor M2 running N 
phases. Let C(M2) C be the set of observation sequences 
for which M^ outputs 1. Then, 


Vi([C(M' 2 )]i) > l-exp(^-|^.iv) 
V2([C(M2)]2) < exp • iV^ . 


and 


Hence the error probability decays exponentially with N. 
To prove Theorem [5] we show (in the appendix) that the 
same error bound, up to a factor of 2, holds for the 
likelihood-based monitor M2. The authors are not aware 
of a proof of Theorem [S] that avoids reasoning about a 
monitor like M2. The proof shows that the difference 
Pi([£(M2)]i) — V2([C(M2)]2) cannot be increased by any 
other monitor that is based solely on the first N ■ m obser¬ 
vations: M2 is optimal in that respect. 

To guarantee an error probability bound of at most e of 
the likelihood-based monitor M2, we set = [^ • log (f)]. 


Example 8. Figure^ shows two HMCs H\,H2 with a 
parameter 5 G (0, |]. In every step except the first one. 
Hi outputs a with probability ^ -I- 5, and b with probability 
i — (5. For H2 the probabilities are reversed. The HMCs 
are distinguishable. The intuitive reason is that Hi tends to 
output more a-symbols than b-symbols, whereas H2 tends to 
output more b-symbols than a-symbols, and this difference is 
exhibited in the long run. Intuitively speaking, the smaller S 
is, the “less distinguishable” are Hi and H2. We will show 
later that there is a profile with c = S. By Theorem\^ the 
probability that M2 mistakes H2 for Hi decays exponentially. 
More specifically, for an error bound of e it suffices to make 
log L observations, for a constant B > 0 . It can be 
shown that there is a constant d > 0 such that M2 needs, for 
small e, at least dS~^ log L observations to push the error 


X := 


® -I- 1 











Figure 4: Two distinguishable HMCs with a parameter <5 € (0, |] 


probability below e. Hence, for the HMCs from Figure^ the 
bound of Theorem [5| is asymptotically tight. As mentioned 
after the proof of Theorem [5l the likelihood-based monitor 
is essentially optimal among monitors that observe a fixed- 
length prefix. So the bound from Theorem is also asymp¬ 
totically tight. □ 

4.3 Monitors with One-Sided Error 

Now we present Mi, a likelihood-based monitor with 
one-sided error. Monitor Mi uses a threshold parameter 
low £ (0, 1]. For each N > 0, after reading a prefix v, of 
length N-m, of observations, it computes the likelihood ratio 
lr{v). If lr{v) < low, it terminates outputting 1, otherwise 
it continues. 

For any infinite sequence u and integer i > 0, let u[i\ 
denote the prefix of u of length i. We fix an integer N > 0. 
Let Un be the set of all u G such that lr{u[N ■ m]) < 
exp(—• N"). Recall from Theorem [5] the set £(M 2 ) of 
observation sequences for which M 2 outputs 1. It should 
be easy to see that Un C C{M 2 ). We need the following 
technical lemma. 

Lemma 9. F’i([C/Ar]i) > 1 - 4exp ^ 

This allows us to prove the following theorem: 

Theorem 10. Consider the monitor Mi with threshold 
parameter low G (0,1]. Let £{Mi) C E" be the set of obser¬ 
vation sequenees for whieh Mi terminates (and hence out¬ 
puts 1 ). Then, 


4.4 Monitors for Distinguishing Among Mul¬ 
tiple HMCs 

Now we address the problem of distinguishing among mul¬ 
tiple mutually distinguishable HMCs. We present a monitor 
based on likelihoods. For i = 1,... ,k,let Hi = {Gi, Oi, Si,o) 
be HMCs with the same observation alphabet E where 
Gi = {Si,Ri,(f>i). Let Vi and [-Ji be the associated probabil¬ 
ity measures and inverse observation functions correspond¬ 
ing to the HMC Hi. We assume that they are mutually 
distinguishable, i.e., for 1 < i < j < k, HMCs Hi and Hj 
are distinguishable. So by Proposition |4] there are profiles 
(Ai,j, Cij). Define c := min{ci,j | 1 < i < j < k}. 

Let m := 2-max{|S'i| | 1 < i < fc} and > 0 be an integer 
parameter. The following monitor M distinguishes among 
the k HMCs: it takes an observation sequence u G E^ "* as 
input and outputs the smallest integer i G {1,..., fc} such 
that pr^{5si^,,u) > prj{5s,jtj,u) for all j G {l,...,fc}. Es¬ 
sentially, M outputs the index of the HMC whose likelihood 
value is the highest after N ■ m observations. By applying 
the union bound to Theorem [S] we get: 


Theorem 12. Consider the monitor M. Let i G 
{1, ... ,k} and let Ci C fee the set of observation 

sequences for which M outputs i. Then we have for all 
je{l,...,k}-{i}: 


Pi([£iE“]i) > l-2fc.exp(^-|^ 
iP,([AE“],) < 2exp(^-|^.iv) 


and 


Vi{[£.{Mi)]i) = 1 and 
V 2 i[£{Ml)] 2 ) < low. 

Now we analyze the response time of Mi taken on ob¬ 
servation sequences generated by Hi. Formally, we define 
a random variable T : {si,o}5“ N such that T is the 
number of observations made by monitor Mi before out- 
putting 1. The following proposition bounds the expected 
value of T in iLi. 

Proposition 11. £1 (T) < -log -i-■ low-\-m, 
where £1 (T) is the expeeted value of T under the probability 
measure Vi. 

The proof of this proposition employs ideas similar to those 
in m for proving an upper bound on the expected monitor¬ 
ing time for exponentially converging monitorable systems. 
Observe that as low decreases, the first term of the bound 
dominates. 


5. COMPUTING PROFILES 

In the monitors of Section |4] the constant c > 0 deter¬ 
mines the number N of phases needed to ensure a bound on 
the error probability. Recall that c is the constant in a pro¬ 
file (.4, c). Any such constant c will do, but the larger it is 
the better, since the number of phases used will be smaller. 
Note that even the existence of a positive c (as claimed by 
Proposition |31l is not obvious. In this section, we prove The¬ 
orem [13] —which strengthens Proposition |4] —by presenting 
a polynomial-time algorithm to compute a positive c and 
also the representation of a profile function A in polynomial 
time. 

Let a test set Test C E* be a set of at most m words, 
with |u| < m for all v G Test. This defines a function 
.4 Test : DistrfSi) x Distr{S 2 ) —>■ 2^ in the following way. 
Fix ipi G DistrfSi) and 1(2 G Distr{S 2 ). Let v G Test be 
such that 

u := arg max \prj^{'ipi,w) — pr 2 {'if 2 ,ui)\ (2) 





and write 

|v| := {nui I w € E*, = m} 

for the set of strings of length m with n as a prefix. Then 
define: 


In this example, the last symbol of any observation se¬ 
quence reveals the state. Hence there are only two reachable 
pairs of distributions: one is (tti, 7r2) with 7ri(so) = = 

1, and the other one is (ttijTt^) with ■7ri(si) = '7r2(ti) = 1. 
Using the definition of dist we compute: 


ATest{ll>l,'lp2) 


N if (V>i, n) > pr2 , f) 

E"* — |ii| otherwise 


Depending on the case above, pr^{lpi, ATest{'4’i,'tp2)) is either 
pr^{'ipi,v) or 1 — pr^{'ipi,v). Hence: 


pr 1 (V^l, Axest {'tpl,i’2)) - prj (V'2 , Axest O’! , V'2)) 

= \pri{-tpi,v) - pr2{i’2,v)\ (3) 

Given a test set Test and distributions a monitor 

can compute the word v from m using o, and hence the 
probabilities pr^{'^pi,Axesti4>l,'f’2))■ Moreover, a monitor 
can check whether a given word w € E"* is in Axest{'f>i,4’2) 
by checking whether v is a prefix of w. 


Theorem 13. Let HMCs Hi,H 2 be distinguishable. One 
can compute, in polynomial time, a test set Test C E* and 
a number c > 0 such that {Axest, c) is a profile. 

The proof builds on [7] but requires further insights. For the 
proof we need the concept of equivalence: For i = 1, 2 let 
ipi € Distr{Si). We say that ipi is equivalent to ?/)2, written 
as ijji = V'2, if P?'i(V'i,w) = pr2(V'2,u) holds for all u £ E*. 
We have the following proposition: 

Proposition 14. One can compute, in polynomial time, 
a test set Test C E* such that for all ipi £ Distr{Si) and all 
11)2 £ Distr{S 2 ) we have: 

11)1=11)2 \/u € Test : pr^w) = pr2('!/'2 ,m) 

The algorithm for Proposition [TT] uses linear-algebra based 
techniques that have been developed for deciding equiva¬ 
lence of HMCs, see e.g. [23 [Ml [13 US]. 

We fix Test for the remainder of the section. We define a 
distance measure dist{'ipi,'tp 2 ) between i/)i,'!/>2 given by 

dist{'ipi,ip 2 ) := max |pr^('!/)i, ui) — pr2('i/'2, ui)| . 

wG Test 

By Proposition 1 141 we have: 

ipi = il )2 <=^ dist{'ij)i,'ip 2 ) = 0 

For the following proposition, linear programming is used to 
compute a lower bound on dist{ip 1 , 11 ) 2 ) for reachable pairs 
(V’iiV’2) in distinguishable HMCs: 

Proposition 15. Let Hi,H 2 be distinguishable HMCs. 
One can compute, in polynomial time, a rational num¬ 
ber c > 0 such that for all reachable pairs {ipi,'tp 2 ) of dis¬ 
tributions we have dist{%lii,ip 2 ) > c. 

In general there may exist unreachable pairs {'ipi,'tp 2 ) of dis¬ 
tributions with dist(ipi, 11 ) 2 ) = 0, even for distinguishable 
HMCs. Proposition [15] establishes in particular the nontriv¬ 
ial fact that for distinguishable HMCs there exists a positive 
lower bound on dist{'ipi,'tp 2 ) for all reachable pairs {'tpi,'tp 2 ). 

Example 16. Consider again the HMCs from Figure [^ 
We compute the set Test according to the algorithm from 
Proposition m This yields Test = {e, a, aa, ba}, where e 
denotes the empty word. 


dist{'Ki,'K2) = prj^{'Ki,aa) — pr2{'K2,aa) 



dist{- k'i, 1 x 2 ) = prj(7ri,ba) - pr 2 {'K 2 ,ba) 



Hence we have distiipi, ^2) = 2(5 > 0 for all reachable pairs 
{ipi, '02) of distributions. 

In order to illustrate some aspects of Proposition 1151 
we use linear programming to compute a lower bound on 
dist{’tpi,’ip 2 ) for all (reachable or unreachable) pairs (0i,02) 
of distributions. Concretely, we solve the following linear 
program, where 5 is the constant parameter from the HMCs 
Hi,H 2 , and the variables are x and variables encoding dis¬ 
tributions 01,02.’ 

minimize x > 0 

subject to: 0i £ Distr{Si), 02 £ Distr{S 2 ), 

— X < pr j (01 , m) —pxj (02, u) < X for all u £ Test . 

An optimal solution is x = S and 0i(so) = | “ f nnd 
0i(si) = i + f and 02(to) = | + f and 02(ti) = i - f • 
Hence x = S > 0 is a lower bound on dist {ipi, 1 ( 2 ) for all 
pairs of distributions, and hence, a fortiori, also for all 
reachable pairs. As mentioned after Proposition 1151 the 
reachability aspect is in general (unlike in this example) es¬ 
sential for obtaining a positive lower bound. Indeed, the 
proof of Proposition 1151 takes advantage of further results 
from 

If we compute a lower bound according to the proof Propo¬ 
sition [1^ i.e., taking reachability into account, we obtain 
c = 45/(3 -I- 25), which lies strictly between the previously 
computed lower bounds S and 25. □ 


With Proposition IS] at hand, we are ready to prove The¬ 
orem 13 

Proof of Theorem 1131 Compute Test according to 
Proposition 13 and c > 0 according to Proposition 1151 We 
show that {Axeat, c) is a profile. Let (0i, 02) be a reachable 
pair of distributions. Let u £ E* be as in We have: 


pr 1 (01, Axest (01,02 ) ) - pr 2 (02 , Axest (01,02 ) ) 
= |pri(0i,u) - pr2(02,u)| by® 

= max |pri(0i, w) — pr2(02, w)| by® 

w£ Test 


= disf(01,02) 
> c 


def. of dist 
Proposition 13 


This completes the proof. □ 


We have seen that for a given error bound, the num¬ 
ber of observations our monitors need to make depends 
quadratically on |. So it may be beneficial to compute 
a larger value of c, even if such a computation is expen¬ 
sive. To this end, for a distribution tt £ Distr{S), write 
supp{-K) := {s £ S' I 7r(s) > 0}. For HMCs Hi, H 2 , if a 


pair ('i/’i,'02) of distributions is reachable, we say that the 
pair (swpp( 0 i), siipp(' 02 )) is reachable. We have the follow¬ 
ing proposition: 

Proposition 17 . Let Hi,H 2 be two distinguishable 
HMCs. One can compute, in exponential time: 

c := min min min 

reachable (S{,S 2 )e 2^1 X2^2 tlJieDistr(S[) ^ 2 eDistr(S 2 ) 

{prj^{ipi,U) - pr^{tp 2 ,U)) 

(Note that U ranges over a set of double-exponential size.) 
This value of c is lower-bounded by the value of c > 0 from 
Theorem and it is part of a profile with 

A{tpi,‘ip2) = arg jna^ (pri( 0 i, 17 ) — pr 2 (V’ 2 , 17 )) . 

6. APPLICATION: RUNTIME VERIFICA¬ 
TION 

In this section we discuss an application of monitors for 
runtime verification of stochastic systems. Traditional veri¬ 
fication aims at proving correctness of systems at the time of 
their design. This quickly becomes infeasible, in particular 
for complex systems with several components and stochastic 
behavior, see e.g. [^. Runtime verification is an alterna¬ 
tive where a monitor observes a system while it is running, 
and raises an alarm once a faulty behavior is detected. The 
alarm may trigger, e.g., a fail-safe way of shutting the sys¬ 
tem down. HMCs were suggested in 123 ED as models of 
partially observable stochastic systems. In this section, the 
monitor does not try to distinguish two HMCs, rather it tries 
to distinguish correct and faulty behavior of a single HMC. 

Definitions. For a probability measure V and measur¬ 
able sets C,D such that 'P{C) > 0, we let 'P{D \ C) de¬ 
note the value which is the conditional probability 

of D given C. A classifying HMC (cHMC) is a quadruple 
H = (G, O, So, Class), where (G, O, so) is an HMC and Class 
is a condition classifying each bottom strongly connected 
component (BSCC) of H as bad or good. For a cHMC and 
a state s € S' we define: 

Bads := {ssiS2 • • • € {s}S“ | : Si is in a bad BSCC} 

Coods := {ssiS2 • • • € {s}S“ | : Si is in a good BSCC} 

Define Bad := Badsg and Cood := Goodsg. The events 
Bad and Good are disjoint and measurable. By fundamental 
properties of Markov chains we have 

V{Bad U Good) = P{Bad) + P{Good) = 1 . 

To avoid trivialities we assume that V{Bad),'P[Good) > 0 
(this can be checked in polynomial time by graph reacha¬ 
bility). We say that a cHMC H is monitorable if for every 
£ > 0 there exists a monitor M such that 

P([£(M)] I Bad) > 1 — e and 
V{[£{M)] I Good) < e. 

In | 26 | the authors define and study monitorability of pairs 
{Ho, ^) where Hq is an HMC and M is a deterministic Streett 
automaton. One can compute, in polynomial time, the prod¬ 
uct of Ho and A. That product is a cHMC H as defined 
above. Then {Ho, A) is monitorable (in the sense of | 26 |i if 
and only if H is monitorable (in the sense defined above). 


A construction similar to one that was given in (4] Section 
3] allows us, for a given cHMC H, to construct two HMCs 
Hi,H 2 that exhibit the bad and the good behavior of H 
according to their conditional probabilities: 

Proposition 18. Let H be a cHMC with 
P{Bad),P{Good) > 0. Then one can compute, in 

polynomial time, HMCs Hi, H 2 such that for all measurable 
events E C S'^ we have 

Vi{E) = r{E I Bad) and p 2 {E) = V{E \ Good) . 

It follows from Proposition m that distinguishing and 
monitoring are equivalent: Given HMCs Hi,H 2 , we can 
combine them into a single cHMC H by introducing a new 
initial state so, which branches to the initial states of Hi, H 2 
with probability 1/2 each. We classify the BSCCs of Hi and 
of H 2 as bad and good, respectively. Then for any E C 
we have 

Vi{E) = P{{0{so)}E I Bad) and 
V 2 {E) ^ P{{0{so)}E I Good), 

so any monitor for H can be translated in a straightforward 
way into a monitor that distinguishes Hi and H 2 . Con¬ 
versely, given a cHMC H, we can compute Hi, H 2 according 
to Proposition m Then any monitor that distinguishes Hi 
and H 2 also monitors H. 

By combining this observation with Theorem[3]we obtain: 

Corollary 19. One can decide in polynomial time 
whether a given cHMC H is monitorable. 

Another kind of monitorability, called strong monitorabil¬ 
ity [26], was shown PSPACE-complete in [26]. Strong mon¬ 
itorability implies monitorability. 

Using Proposition 1181 again, the monitors from Section [4] 
apply to monitoring cHMCs. For instance, the monitor with 
one-sided error can guarantee that (a) given that the behav¬ 
ior is faulty then an alarm is raised with probability 1 and 
within short expected time, and (b) given that the behavior 
is correct then probably no alarm is raised. 

7. CONCLUSIONS 

In this paper we have considered the distinguishability 
problem for HMCs. We have shown that it is decidable in 
polynomial time. 

We have presented two likelihood based monitors Mi, M 2 
for distinguishing between HMCs Hi, H 2 based on the se¬ 
quences of observations generated by them. The monitor M 2 
makes a decision after running for a fixed number of obser¬ 
vations and exhibits two-sided error. It processes O(logi) 
observations to ensure an error probability of at most e. 
The monitor Mi has only one-sided error. The expected 
number of observations it processes to identify a sequence 
generated by Hi is 0(log j) to guarantee an error probabil¬ 
ity of at most £ on sequences generated by H 2 . We have also 
provided a monitor for distinguishing multiple HMCs. All 
error analyses rely on martingale techniques, in particular, 
Azuma’s inequality. 

Polynomial time bounded algorithms are provided, which 
for the monitor M 2 , compute the number of observations 
that guarantees a given upper bound on the error, and for 
the Ml compute the expected number of observations of Hi 



before which an alarm is raised, for a given error bound 
on the probability of raising an alarm on inputs generated 
by H 2 - These algorithms employ linear programming based 
techniques for computing profiles. 

We have discussed an application to runtime verifica¬ 
tion of stochastic systems. The monitorability problem for 
cHMCs is polynomial-time equivalent to distinguishability, 
and hence decidable in polynomial time. We have shown 
that the monitors developed in this paper can be adapted 
so that they monitor cHMCs. 

One direction for future work is to improve the efficiency 
of computing a good lower bound on c. We have seen that 
this bound strongly influences the number of observations 
the monitor needs to make, so the bound may determine 
the applicability of a monitor in practice. Another direction 
is to develop a notion of a monitor for HMCs that are not 
equivalent but not distinguishable. Such monitors might 
still attempt to distinguish between the HMCs for as many 
runs as possible. 
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APPENDIX 

A. PROOFS OF SECTION 4 

We prove Theorem [3 from the main text. 

Theorem [T] Consider the monitor running N 
phases. Let C^M^) C be the set of observation sequences 
for which outputs 1. Then, 

■ and 

P2([£(M2)]2) < exp^-|g-iV^ . 


Pi([£(M2)]i) > 1-exp^--^ 


Proof. Let V’ 2 ,fe and Xk denote the values of i/ii, ■02 
and X directly after the fc-th phase, for k > 0. Initially, 
tpi ,0 = <5si,o, '*/’ 2 .o = 5 s2,o and Xo = 0. For fe > 1, let 
Uk £ be the sequence of all observations received until 

and including phase k. By induction on k, it is easy to see 
that ipi^k = cdi{5si^o,Uk) for fe > 0, i = 1,2. 

Note that Xk depends only on Uk ■ In the following we view 
Xo, Xi, ... as a sequence of random variables. (Formally, 
for k > 0 the random variable Xk is a function of type Xk : 
{si,o}5“ Q.) We also define a sequence yo,Fi,... of 

random variables with Yk = Xk + k ■ Note that Xo = 
To = 0. 

We show that the sequence of random variables Yo,Yi,... 
forms a supermartingale in Hi. Let k > 0. Fix Uk G E^'"*. 
Recall that this determines Xk. For the conditional expected 
value of Xk+i given prefix Uk we have; 

£{Xk+i\lukJ:‘^]i) = Xk+d, (4) 


where d denotes the expected change of x after phase k + 
1. Recall that fii^k = cdi{5si Q,Uk) for i = 1,2. Let pi = 
pri(ipi,k,A{iJi,k,ii 2 ,k)), for i = 1,2. Assume pi +P 2 < 1. 
According to our rule for updating x we then have: 


d 


Pi ■ (-1) + (1 -Pi) • 


Pi +P2 
2 — pi — P2 


P2 — Pi 
2 - pi — P2 


This is negative. Moreover, by the definition of a profile we 
have Pi — P 2 > c > 0. Further more, 1 < 2 — pi — p 2 < 2. 
Hence: 


d < 



< 


c 

2 


we obtain: 

ViiX.yO} = Vi{y^>1-n] < exp(^-J^j 
= exp(-l^.iv) 

Hence, 

Pi([£(M)]i) = Vi{Xn<Q} > 1 - eicp ■ Xy 

From this, it follows that 'Pi{[£{M)]i) > 1 — exp(—• N). 

The proof of the second inequality in the statement is sim¬ 
ilar with the following modifications. The random variables 
X'k are defined like Xk, but on sequences of states in H 2 
rather than Hi. Define Y^' = X'k — A: • |. The sequence 
Yo,Yi,... is now a submartingale. Applying Azuma’s in¬ 
equality to this submartingale now leads to the second in¬ 
equality claimed in the statement. □ 

The following lemma is used for the proof of Theorem [5l 

Lemma 20. Let S be a countable set. Let ipi,'fj 2 be prob¬ 
ability distributions over S. For i £ {1, 2} and any event 
V C S define i>i{V) := J2vev■ Define 

IT := {s G S I ipi{s) > i! 2 {s)} . 


Then 


m^{MV)-MV)) = MW)-MW), 

i.e., W maximizes the probability difference over all events. 
Proof. If s G S' with s and ipi{s) > V’2(s), then 

■!/'i(H U {s}) - 1p2{V U {s}) 

= i^i{V) - ip 2 iV) -F ipi{s) - 'ip 2 {s) 

>MV)-MV). 

Similarly, if s G S with s £V and fii{s) < '<p 2 {s), then 

V'i(^\{s})-V’2(Y\{s}) 

= V’l(^) - V’ 2 (H) - -iplis) 'lp 2 {s) 

>MV)-MV). 

The statement of the lemma follows. □ 


Combining this with 0 and the definition of Yk we obtain: 

£:(Yfc+i I KE“]i) =Yk + ^+d <Yk (5) 
Now assume pi -I-P 2 > 1. Then we have: 


Now we prove Theorem [5] from the main text. 

Theorem [S] Consider the monitor M 2 that reads 
the first N ■ m observations. Let C{M 2 ) C E“ be the set 
of observation sequenees for whieh M 2 outputs 1. Then we 
have 


2 - pi -p 2 , ... ... P 2 -P 1 

d = -Pi-^--b(l-pi)-l = -^- 

Pi + P2 Pi + P2 

^ P2 -Pl ^ _C 
2 - 2 ’ 

so m again follows. Hence we have shown that To, Yl, ■ • ■ is 
a supermartingale in Hi. 

By definition of the update rule we have \Xk+i — Xk \ < 1 
and hence \Yk+i — Yk\ < 1 + f < §■ Applying Azuma’s 
inequality (see, e.g., m) to the supermartingale Yo, Yi,... 


Pi([£(M 2 )]i) -p 2 ([£(M 2 )] 2 ) > 1 - 2exp 



Hence, 

Vi{[C{M2)]i) > l-2exp(^-|^.iv) 
P2([£(M2)]2) < 2exp(^-|^.A) . 


and 









Proof. Let > 0. We can write C{M 2 ) = VFS“ 
where W C denotes the set of observation prefixes 

of length N ■ m on which M 2 outputs 1. Then we have: 

W = {m G I pri(5si,o,M) > 

= {« G S"' ™ I Pi([WE“]i) > P2 ([{u}E“]2 )} 

(We left the output of the monitor unspecihed when the 
likelihood ratio is equal to 1. As a consequence, the inequal¬ 
ities above might be strict. This does not affect the rest of 
the argument.) Using Lemma EUl we obtain the following 
inequality. 

Pi([WE“]i)-P2([WE‘"]2) > Pi([UE“]i)-P2([UE“]2) 

( 6 ) 

for all V C E^'"*. In particular, this holds for the prefixes 
of length N ■ m oi C^M^) from Theorem [7] Hence we have: 


iPl([£(M2)]l) -iP2([T(M2)]2) 

= iPi([WE“]i)-P2([1UE“]2) 

> iPi([£(M^)]i)-p2([£(M^)]2) 



as £(M 2 ) = WE“ 
by © 

by Theorem [3 


This concludes the proof of the Theorem [S] □ 


Computation for Example© We analyse the likelihood- 
based monitor M 2 for the HMCs of Figure 3] The moni¬ 
tor M 2 makes N ■ m = 4A observations. It is easy to see 
that it outputs 1 if and only if it reads at least as many 
a-symbols as fe-symbols, i.e., the number of read a-symbols 
is at least 2N. Hence we have: 


P2([£(M2)]2) 


4N 


= E 

i=2N 




> 




(4jV)! 

(2iV)! • (2A)! V4 ) 

2^^ . (4Ar - 1) . (4iV - 3) • • • 5 • 3 
(2iV)-(2Ar-l)-(2iV-2)---2-l 


o2iV 

^ _f_ r)2N-l 

- {2N) ■ 




2N 


For X G [0, |] we have ln(l — x) > —2x. So we can continue 
as follows: 


P2([£(M2)]2) 

> ^ - exp (-165" AT) 

> exp {^—173^ N) for large N 

It follows that for small e, an inequality e > P 2 ([£(AL 2 )] 2 ) 
implies that N > -^S~^ In 4. This completes the calculation 
for the example. □ 

We prove Lemma ©from the main text. 

Lemma© Vi{[Un]i) > 1 — 4exp ^ 


Proof. By contradiction. Contrary to the lemma, as¬ 
sume: 

Pi([C/iv]i) < l-4exp(-EA) (7) 

Let Ujv := £(M 2 ) — Un, and let Wjv denote the set of all 
prefixes, of length N -m, of sequences in Vn- Clearly, for all 
V G Wn'- 

exp ( - ■ Af) < lr(v) < 1 

It follows for all u G VFjv: 

pri(<5,i o,u) = < exp - a) • pr2(<5.2,o, u) 


Hence, Pi([V'jv]i) = ^ pri((5si,o, 

v^Wi^ 

2 

<exp(|^-A)- pr2{Ss2,o,v) ■ (8) 

vGWn 


^'P2{[Vn]2) 

From Theorem ©we know that 

p2([Viv]2) < P2([£(M2)]2) < 2 exp(-^-A). 

By combining this with ©, we get: 

J>.([r„|.)<2exp(£.N)-.xp(-£.N) 

= 2=p(-^-w) 

We have Pi([£(M2)]i) = Pi([Uivli) + Pi([t/jv]i). Using © 
and ®, we get: 

2 2 

Pi([£(M2)]i) < l- 2 exp(-^.A) < l- 2 exp (-• a) 
But this contradicts Theorem© □ 

We prove Theorem [ 10 ] from the main text: 

Theorem llOi Consider the monitor Mi with thresh¬ 
old parameter low G (0,1]. Let £(Mi) C E“ be the set of 
observation sequences for which Mi terminates (and hence 
outputs 1 ). Then, 

Pi([£(Mi)]i) = 1 and 
p 2 ([£(Mi)] 2 ) < low. 


Proof. Let No be the smallest integer such that 
exp §g • Ao^ < low. Clearly, for all A > No we have 
C(Mi) Un where Un is the set defined at the beginning 
of Section B© From this observation and Lemma© we see 
that for all A > Aq: 

Pi([£(Mi)]i) > 1 - 4exp • A^ 

From this, we get 

( c^ 

Pi([£(Mi)]i) > lim 1 —4exp(——-A 

N—^oo \ 00 


= 1. 







Let X = {v € (S"*)* I pr^((5sj u) > 0 , lr{v) < low, Vi < 
|v| : ir(ii[i]) > low}. Intuitively, X is the set of short¬ 
est observation sequences whose length is a multiple of m 
and whose likelihood ratio is < low. It is easy to see that 
C{M\) = XYX. Observe that there do not exist two distinct 
sequences vi,V 2 € X such that vi is a prefix of V 2 . 

V2i[CiMi)]2) = '^pr2i3s2,o,v) 

vGX 

< low • E pri((5si,o,w) 

vex 

= low ■ Pi([£(Mi)]i) < low 


□ 


We prove Proposition m\ from the main text. 
Proposition [TTJ £i (T) < 2^. log ^ • low + m, 

where £i{T) is the expeeted value ofT under the probability 
measure Vi. 

Proof. Since T is a nonnegative integer valued random 
variable, from [ 20 ], we see that £i{T) = > ^}- 

Since Mi only decides after each phase, i.e., after read¬ 
ing each successive sequence of m observations, we see 
that £i{T) = '^N>o^ ■ ^ Let Nq be 

the smallest integer such that exp < low, i.e.. 

We have: 

JVo-i 

SiiT) = E m ■ 'Pi{T > N ■ m} + m • Vi{T > N ■ m} 

N =0 ' ^ N>No 

< m ■ No + m ■ Pi{r > • m} ( 10 ) 

N>No 

For N > 0, let Xn = {u € | lr{u[N ■ m\) > low}. 

Observe that, for N > No, Xx Q E“ — Un. Further, 


ViiT >N-m}< Vi{[Xn]i) < Pi([E“ - Un]i) 

< 4 exp (-- I from Lemma (Oj 

V 36 y 


( 11 ) 


From GHI) and HU, we get 
£i (T) = m • A^o + 4m ■ ^ exp E ^ • n'\ 

= m • A^o + 4m • ^ exp + 

= m ■ A^o + 4m • exp ' No \ ■ ^ exp • n) 

k / v>n ^ 


< m • A^o -I- 4m • low ■ 


1 — exp ( — 


(-S) 


36m 1 .7 

< —• log - -h m + 4m • low • 

low 


1 — exp ( — 


(-S) 


( 12 ) 


substituting for Nq. 


By using a Taylor series expansion of exp fg j, we get an 
infinite sum in which the signs of the terms alternate starting 
with a positive sign, and in which the absolute values of the 


terms decrease monotonically. Hence we can upper bound 
its value by the sum of the hrst three terms, which is (1— |g -|- 
^ ggs ). From this, we see that 1 — exp I — §g ) > §g — = 

^^ 2 - 36 ^^^ ■ Using this, after simplification, we see that 


4m • low ■ 


1 — exp ( — 


(-S) 


< 

< 

< 


8 ■ 36^ • m ■ low 
c"-(72-02) 

8 • 36^ • m ■ low 
71 • c2 

147 • m • low 


since c < 1 

( 13 ) 


Using the bound of m in (O, we obtain the state¬ 
ment. □ 


We prove Theorem (Hj from the main text. 

Theorem 1121 Consider the monitor M. Let i € 

{l,...,fc} and let Ci C ggi gf observation 

sequences for which M outputs i. Then we have for all 
j€{l,...,k}-{i}: 


Pi([£iE“]i) > l-2fc-exp(-Ar) and 


P,([AE“],) < 2exp(-^.iV 


Proof. For j G { 1 ,..., fc} — {i} define: 

^ Uu \ pri{Ssi^o,u) > prj{5s^^o,u)} if i < j 

\{u€E^-^\pr,{Ss,^„u)>pr^{Ss,^„n)} if i > j 

By Theorem [5] we have: 


Pi([£i.,E“]0 > l-2.exp (-—• AT 


and 


P,([£„E“].) < 2exp(--.iV 


(14) 


We have: 


l-P,([£.E“]i) 

= 1 - T’i([njVi£i,jE“]i) Ci = r\jjiijC.i,j 

< ^^Pi([(E^'"* — £i, 3 )E“]i) union bound 

= J2{'^-Vi{[CijC-]v)) 

- E(2'®^p(“Y^'^)) by dHI) 

V \ / / 

< 2k ■ exp • Af^ 


The first inequality follows. Further we have: 


P,■([£,£“],) < lP,([£i,,■£“],) 

< 2 exp(-£.v) 

This proves the second inequality. □ 


Ci C Ci,j 
by (dll) 











B. PROOFS OF SECTION 5 

We prove Proposition M from the main text. 

Proposition 1141 One can compute, in polynomial 
time, a test set Test C E* such that for all tpi € Distr{Si) 
and all ip 2 € Distr{S 2 ) we have: 

'ipi=%l )2 <=> yu e Test : pr-^{'ip\,u) = pr 2 {'ip 2 ,u) 


Proof. For both i = 1,2 and all a G E define a matrix 
Mi{a) G [0,1]®*^®“ with 




<f>i{s,t) if Oi(s) = a 
0 otherwise 


for all G Si. 


For i = 1,2 write rii G {1}'^* for the column vector all 
whose entries are 1. For i = 1,2 and for any string 
u = ai .. .Ok G E* define the column vector rii(u) G 
[0,1]®* with rii{u) = Mi{ai) ■ ■ Mi{au) ■ rp. For all 

s £ Si we have, according to the definitions, the equality 
(rji{u)')^ — Pi_s([uE“]i), which is the probability that the 
string u is output by Hi starting from s. For a distribution 
'ipi G Distr{Si) write (ipi) G [0,1]®* for the stochastic row 
vector with = i/’i(s). According to the definitions, we 

have pr^{ 1 p\,u) = (ifi) ■ rii{u) for all u G E*. Define 

r]{u) := ( ^ ^ G [0, 1 ]®!'-'®^ for all u G E*. 


The direction is immediate. For the converse 

assume = pr 2 {ip 2 ,u) for all u G Test. Then we 

have for all u G Test: 

0 = pr 1 (i/)i, u) - prj {i ’2 , u) 

= (V’l) ■?7l(u) - (tp2)-V2{u) 

= ((V’l) {V’2» • v{u) 

Since {rj{u) \ u G Test} = 17 is a basis for V, it follows that 
({l/>l) (V'2)) is orthogonal to V. We have already argued 

that this implies 'ipi = 'ip 2 . This completes the proof. □ 

We prove Proposition [m from the main text. 

Proposition [T^ LetH\,H 2 be distinguishable HMCs. 
One can compute, in polynomial time, a rational number c > 
0 such that for all reachable pairs {ipi,'ip 2 ) of distributions 
we have dist{ipi,tp 2 ) > c. 

Proof. We say a state si G Si dominates a distribution 
tpi G Distr{Si) if ipi{si) > ipi(ti) holds for all t\ G Si. 
We say a pair of states (si,S 2 ) is reachable if there exists 
a reachable pair of distributions ('i/'i,i/’ 2 ) with tpi{si) > 0 
for both i = 1,2. Note that one can compute, in poly¬ 
nomial time, from Hi , H 2 the set of all reachable pairs of 
states. For si G Si define Unreach{si) := {s 2 G S 2 | 
(si,S 2 ) is not reachable.}. For every si G Si, consider the 
following linear program £P(si) over a real variable x and 
over real variables encoding distributions ipi G Distr{Si) 
and 'tp 2 G Distr{S 2 ): 


Hence we have '01 = 02 if and only if 

((0i) ( 02 )) • p(m) = 0 for all u G E*. 

It follows that we have 0i = 02 if and only if ((0i) ( 02 )) 

is orthogonal to the vector space, say V, spanned by {ri{u) \ 
u G E*}. Define 

M(.) (";<“> „»„,) € 

for all M G E*. Note that T]{au) = M{a)r]{u) holds for 
all a G E and all u G E*. Hence the vector space V can 
be equivalently described as the smallest vector space that 
contains 'q{e) (where e denotes the empty string, i.e., all 
entries of 77 (e) are ±1) and satisfies M{a)v G V for all a G E 
and all u G V. 

We now give a polynomial-time algorithm for computing 
a set Test C E*. The algorithm is as follows: Initialize 
Test := {e} where e denotes the empty string. Then, as 
long as there are a G E and w G Test such that M(a)r](w) 
is linearly independent of { 77 (u) j u G Test}, set Test := 
Test U {aw}. 

Now we show that the computed set Test has the proper¬ 
ties claimed in the proposition. Since V is the smallest vec¬ 
tor space that contains 77(e) and satisfies M{a)v G V for all 
a G E and all v £ V, the set U := {77(a) j u £ Test} for the 
computed set Test is a basis for V. Since V is a subspace of 
R®iu ®2 ^ the dimension of V is at most 771 = |<S'i j -I- |S' 2 |. Since 
[7 is a basis, we have j Test\ < m. Since every string that the 
algorithm adds to Test is only one letter longer than some 
other string already in Test, it follows that |u| < j Test\ < m 
holds for all u £ Test. Finally we show for all 0i G Distr{Si) 
and all 02 G Distr{S 2 ): 

Vw G Test : pr^{'ilii,u) — pr 2 ( 02 ,M) 


minimize x > 0 
subject to: 0i G Distr{Si) 

02 G Distr{S 2 ) 

Si dominates 0i 

02(s2) = 0 for all S 2 G Unreach[si) 

— x< prj^(0i, u) — pr2(02, u) < X 
for all u £ Test. 

Note that all constraints are linear (in)equalities. In partic¬ 
ular, we have 777^(01, u) = JDsgSi '*/'i(®) ' 7’i.s([aE“]i). The 
probabilities 'Pi,a{[uT,‘^]i) can be computed in polynomial 
time. (Those probabilities are computed already when com¬ 
puting the set Test according to the proof of Proposition 1 141 
they are the probabilities in the vectors rii{u) defined there.) 

For every si G Si, let c(si) denote the optimum solution 
(minimizing x) of CV{si). Define c := min{c(si) j si G Si}. 
Note that c can be computed in polynomial time. We show 
that c has the properties claimed by the proposition. 

First we show that d7st(0i, 02) > c holds for all reachable 
pairs (01,02). Towards a contradiction suppose that there is 
a reachable pair (0i,02) with cijst(0i,02) < c. Let si G Si 
be a state that dominates 0i. Since (0i, 02) is reachable, we 
have 02 ( 52 ) = 0 for all S 2 G Unreach{si). By the definition 
of d7st(0i,02), we have 

-*51(01,02) < 777^(01,'u) -pr 2 (02,-a) < *5f(01,02) 

for all u £ Test. It follows that x := *5t(0i,02) along with 
01,02 is a feasible solution of the linear program CP{si). 
Since c(si) is optimal, we have c(5i) < *5t(0i,02). By our 
assumption we have *5t(0i,02) < c, hence c(si) < c. But 
by the definition of c we have c < c(5i), a contradiction. We 
conclude that *st( 0 i, 02 ) > c holds for all reachable pairs 
Wl, V’2)- 


01 = 02 


Finally, we show c > 0. Towards a contradiction suppose 
c = 0. So by definition of c there is si € Si with c(si) = 0. 
Thus, TP(si) has a solution with a; = 0. That is, there 
exist ipi G Distr{Si) and ^2 G Distr^S^) such that si dom¬ 
inates i/ii, and 

ip 2 {s 2 ) = 0 holds for all S 2 € Unreach{si), (15) 

and pri{tp-i_,u) = pr 2 {ip 2 ,u) holds for all u G Test. By 
Proposition M the last fact implies 

V’l = '*/'2 • ( 16 ) 

Since si dominates ipi, we have 

V>i(si)>0. (17) 


tpi , 'ip 2 is feasible. Moreover, its value is: 

Xu = Xu 

u^U 

= - pr2{'4>2,u)) 

U^U 

= prA'(pi,U) - pr2{'4>2,U) 

= Csj.s' 

For the converse, we show that 05 / g/ is a lower bound to 
the value of any feasible solution. Let {xu)u£S‘^ along with 
V’li '*('2 denote a feasible solution. Let [/ be a set that attains 
the maximum in maxi/cs'" (p^i(V’ii^) “ P’" 2 (V’ 2 , f^)) • We 
can take [/ = {u G E™ | pri{ipi,u) > pr 2 {ip 2 ,u)}. Hence 
we have: 


It follows directly from 0 Theorem 21] that (I15I) - (I17II to¬ 
gether imply that we have d{H\,H 2 ) < 1 for the the total 
variation distance d defined in the beginning of Section [S] 
But then Proposition [T] implies that Hi,H 2 are not distin¬ 
guishable, which is a contradiction. Hence c > 0 must hold. 
This concludes the proof. □ 

We prove Proposition [T7] from the main text. 

Proposition [TtI Let Hi,H 2 be two distinguishable 
HMCs. One can compute, in exponential time: 

c := min min min 

reachable (S[,S2 )Cz2^1 X 2^2 il)iCzDistr(S[) V'2 ) 

jnax (pri(V>i,t/)-pr 2 (i/> 2 ,H)) 


Proof. The reachable pairs {S'i,S 2 ) G 2®“^ x 2®^ can be 
computed in exponential time. So it suffices to show that one 
can compute, for a fixed reachable pair (S'], S 2 ) G 2®^ x 2®^, 
the value 


Cg/ s' •= Riin Riin 

^ 4’2^X)istr(S'2) 

jn|x {priii^uU) - pr2{i>2,U)) 

in exponential time. Consider the following linear program, 
similar to the one from the proof of Proposition nsi with 
variables Xu for u G E"* and variables encoding distributions 
1 pl,'lp2-. 

minimize Xu 

subject to: tpi G Distr{Si) 
ii 2 G Distr{S 2 ) 

0 < Xu for all u G E"* 

pr]^(i/)i,u) — pr 2 (V’ 2 ,m) < Xu for all u G E"* 

This linear program has exponential size. We show that its 
optimal solution is Cg/^g/. 

First we show that it has a feasible solution whose value 
is Cg'_g'. Let '!/>i ,'!/>2 be the distributions that attain the 
minimum from the definition of Cgj g/. Let [/ be a set that 
attains the maximum from the definition of Cg' g'. We can 
take U = {u £ E"* | pr.^{'tpi,u) > pr 2 {ip 2 ,u)}. Let Xu = 
pr^{gpi,u) — pr 2 {ip 2 ,u) for all u £ U, and let Xu = 0 for 
all u G E"* — U. Then the solution with those Xu and with 


I] 

> Y. Xu 
u£U 

{xu > 0 from the linear program) 

> X] “P^2('*/’2,U)) 

uGU 

{x > — pr 2 {'(p 2 ,u) from the lin. program) 

= pri{i>i,U) - pr 2 ( 1 ^ 2 , 11 ) 

= {priiiii,!!) - pr2{4>2,U)) 

- '=s(,s' 

^definition of Cgj g/^ 

We conclude that 05 /^ 5 / is an optimal solution of the linear 
program. □ 

C. PROOFS OF SECTION 6 

We prove Proposition m from the main text. 

Proposition [181 Let H be a cHMC with 

'P{Bad),'P{Good) > 0. Then one can compute, in polyno¬ 
mial time, HMCs Hi, H 2 such that for all measurable events 
E C S‘^ we have 

ViiE) ^P{E \ Bad) and p 2 {E) = ViE \ Good) . 


Proof. By symmetry, it suffices to provide the construc¬ 
tion for Hi. Let H = (G, O, so, Class) be the given cHMC 
with G — {S, R, (f) a Markov chain. Define 


Si := {s G S I Ps{Bads) > 0}. 

Note that so G Si. Define Gi := (SijRijifi) with Ri := 
R n (Si X Si) and 


(j)i{s,t) 


(i>{s,t) ■ Vt{Badt) 
Vs(Bads) 


for all (s, t) £ Ri. 


Finally, take Hi {Gi,Oi, sq) where Oi equals O re¬ 
stricted to Si. 

We show that the measures Pi{-) and P{- \ Bad) are equal. 
By definition, it suffices to show that they are equal on the 
cylinder sets {sorlS]' for all r G S*. We show by induction 
on the length of r that 


Pi..({sr}Sn • Ps{Bada) = P.({sr}Sr C Bads) Vs G Si. 



For the induction base, let r be empty. Then the claim 
follows from Pi,s({s}S“) = 1 and Bads C {sis'". For the 
induction step, let t £ Si and r £ S*. We want to show: 

Pi,s{{str}Si) ■ Vs{Bads) = Vs(,{str}Si n Bads) (18) 

If (s, t) ^ Ri then both sides of (I18II are zero. So let (s, t) £ 
Bi. Then we have: 


Pi,s({str}Sr) -PsiBads) 

= Ms,t) • PiAltryS-^) ■ Vs{Bads) 


(t>{s,t) ■ Pt{Badt) 


ViA{tr}St 


Ps{Bads) 

= ■ Pt{{tr}Si n Badt 

= Psi{str}Si nBads) 


■ Ps{Bads) 

by the ind. hyp. 


This shows (I18II and hence the proposition. □ 



